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In this paper we study the solution space structure of model RB, a standard prototype of Con¬ 
straint Satisfaction Problem (CSPs) with growing domains. Using rigorous the first and the second 
moment method, we show that in the solvable phase close to the satisfiability transition, solu¬ 
tions are clustered into exponential number of well-separated clusters, with each cluster contains 
sub-exponential number of solutions. As a consequence, the system has a clustering (dynamical) 
transition but no condensation transition. This picture of phase diagram is different from other 
classic random CSPs with fixed domain size, such as random A-Satisfiability (K-SAT) and graph 
coloring problems, where condensation transition exists and is distinct from satisfiability transition. 

Our result verifies some non-rigorous results obtained using cavity method from spin glass theory. 

PACS numbers: 89.75.Fb, 02.50.-r, 64.70.P-, 89.20.Ff 


I. INTRODUCTION 

Constraint satisfaction problems are defined as a set of discrete variables whose assignments must satisfy a collection 
of constraints. A CSP instance is said to be satisfiable if there exists a solution, i.e. an assignment to all variables 
that satisfies all the constraints. The core question to CSPs is to decide whether a given instance is satisfiable. CSPs 
have been studied extensively in mathematics and computer science, and play an important role in the computational 
complexity theory. Most of the interesting CSPs, such as boolean K-satisfiability problems and graph coloring prob¬ 
lems, are belong to class of NP-complete: in the worst case the time required to decide whether there exists a solution 
increases very quickly as the size of the CSP grows. 

In recent years, there are many interests on the average case complexity of CSPs, which study the computational 
complexity of random ensembles of CSPs. It also has drawn considerable attention in statistical physics, especially 
in the field of spin glasses. From a statistical physics’ viewpoint, finding solutions of CSPs amounts to find the 
ground-state configurations of spin glasses at zero temperature, where the energy represents the number of violated 
constraints. Most interesting CSPs also display a spin glass behavior at thermodynamic limit (with number of variables 
N —i oo, and number of constraints M —i oo), and encounters set of phase transitions when constraint density c = ^ 
increases. The first transition that caught lots of interests is the satisfiability transition Cg m where the probability 
of a random instance being satisfiable changes sharply from 1 to 0. In the satisfiable phase (the parameter regime 
that w.h.p.[f] random instances are solvable), studies using cavity method [11[5] from spin glass theory tell us that the 
solution space of CSPs are highly structured: with c increasing, system undergoes clustering transition, condensation 
transition and finally satisfiability transition [SHE]. All of these transitions are connected to the fact that solutions are 
clustered into clusters. The clustering phenomenon is believed to effect performance of solution-finding algorithms 
and to be responsible for the hardness of CSPsIH]- 

Besides heuristic analysis using cavity method, rigorous mathematical studies have also made lots of progress on the 
satisfiability transitions and clustering of solutions in CSPs: some CSP models have been proved to have satisfiability 
transition such as K-XORSAT and K-SAT with growing clause length; some CSP models have been proven to have 
a clustering phase, such as K-SAT (K > 8), K-coloring and hypergraph 2-coloring [HUTT]. Hypergraph 2-coloring has 
been proven to have condensation phase in [12) . 


* Electronic address: lt@pku.edu.cn 

[t] ‘with high probability'(w.h.p.) means that the probability of some event tends to 1, as TV ^ oo. 



2 


In this paper we study model RB m, a prototype CSP model with growing domains that is revised from the 
famous CSP Model B [14]. The main difference between model RB and classic CSPs like satisfiability problems 
is that number of states (we called domain size here) one variable can take is an increasing function of number of 
variables. This is probably one of the reason that makes the satisfiability threshold rigorously solvable m, and the 
clustering of solutions also provable as we will show in the main text of this paper. It has been shown that random 
instances of model RB are hard to solve close to satisfiability transition [IKl - fTS] , and benchmarks based on model RB 
(more information on http://www.nlsde.buaa.edu.cn/~kexu/) have been widely used in algorithmic research and 
in various kinds of algorithm competitions (e.g., CSP, SAT and MaxSAT) in recent years. Model RB has also been 
used or investigated in many different fields of computer science. Hardness of model RB makes the relation between 
its solution space structure and its hardness an interesting problem. 

Using cavity method, it has been shown that [TU] replica symmetry solution is always stable in the satisfiable phase, 
which suggests that condensation transition does not exist in this problem. Here we use rigorous methods, namely the 
first and the second moment method unmn], to show that in the satisfiable phase close to the satisfiability transition, 
solutions are always clustered into exponential number of clusters, and each cluster contains sub-exponential number 
of solutions. So we are showing rigorously that the system has no condensation transition. 

The main contributions of this paper are twofold: 

• From mathematical point of view, we give a rigorous analysis on the geometry of solution clusters in model RB 
problems. 


• From statistical physics point of view, we show that there is no condensation transition in this this problem. 
Thus as a consequence, replica symmetry results including Bethe entropy and marginals given by cavity method 
and associated Belief Propagation algorithm, should be asymptotically exact. 


The rest of the paper is organized as follows. Section jH] includes definitions of model RB and brief descriptions 
on previously obtained results on phase transitions of model RB. Sec. 


HI contains our main results which include 

rigorous analysis on clustering of solutions, number and diameter of clusters. We conclude this work in Sec. ||V| 


II. MODEL RB AND PHASE TRANSITIONS 

Random CSP model provides a relatively “unbiased” samples for testing algorithms, helping design better algorithms 
and heuristics, provides insight into complexity theory. The standard random models (such as model B) suffer from 
(trivial) insolubility as problem size increases, then models with varying scales of parameters was proposed to overcome 
this deficiency [2TH25] . Model RB is one of them, who has growing domain size. It is worth mentioning that CSPs 
with growing domains can describe many practical problems better, for example N-queens problem, Latin square 
problem, sudoku, and Golomb ruler problem. 

Here is the definition of model RB. An instance of model RB contains N variables, each of which takes values 
from its domain D = {1,2,-- - ,(i 7 v}, with djsr = N°‘- Note that the domain size \D\ is growing polynomially with 
system size N, and this is the main difference between model RB and classic CSPs like K-SAT problems. There are 
M = rNlnN constrains in one instance, each constraint involves k {k > 2) different variables that chosen randomly 
and uniformly from all variables. Total number of assignments of variables involved by a constraint is For a 
constraint a we pick up randomly pd% different assignments from totally d^ assignments to form an incompatible-set 
Qa- In other words constraint a is satisfiable by the assignment {<Ja\ = Wai, o’a 2 , o'ak} if {o'a} ^ Qa- 

So given parameters (A, fc, r, a,p), an instance of model RB is generated as follows 

1. Select (with repetition) rNlnN random constraints, each of which is formed by selecting (without repetition) 
randomly k variables. 

2. For each constraint, we form an incompatible-set by uniformly select (without repetition) pN'^^ elements of . 
Note that here we consider 

P<l-p (1) 

in order to exclude too few configurations in each constraint, and to facilitate the derivation. 

Given an instance of model RB, the task is to find a solution, i.e. an assignment that satisfies all the constraints 
simultaneously. It is easy to see that total number of configurations is each of which has probability (1— 

to satisfy all the constraints. If we use X to denote number of solutions in one instance, the expectation of it over all 
possible instances can be written as 


E(A:) = Ar“W(i 


( 2 ) 
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Let 


ln(l — p) 

we can see that with r > r*, expectation of number of solutions is nearly 0 for large N. Using Markov’s inequality 


( 3 ) 


P{X > 0) < E(X), 


we know that E(X) gives an upper bound for probability of a formula being satisfiable. So for sure with r > r* w.h.p. 
there is no solution in an instance of model RB. With r < r*, though expectation of number of solutions is larger than 
0, these solutions may distributed non-uniformly, that is some instances may contain exponentially many solutions 
while in other instances there could be no solution at all. 

Fortunately in model RB it has been shown [13] that E(a:) is square root of expectation of the second moment of 
number of solutions with r < r*, hence solutions are indeed distributed uniformly. More precisely, with N ^ oo, 
using Cauchy’s inequality, with r < r* we have 


P{X > 0) > 


E'^{X) 

E{X^) 


1 , 


( 4 ) 


In other words, the satishability transition happens at r*: 


lim Pr(J'f > 0) = 1 when r < r* 

>-oo 

lim Pr(X > 0) = 0 when r > r*. 


However even in the satisfiable phase close to the satisfiability transition, where we almost sure there are solutions, 
it is still difficult to find a solution in an random instance. Actually many efforts have been devoted to designing 
efficient algorithms that work in this regime. So far, our understanding on this algorithmically hardness is based on 
the clustering of solutions in the satisfiable regime close to transition. In statistical physics, the methods that we can 
use to describe the solutions space structure are borrowed from cavity method in spin glass theory. From statistical 
physics point of view, CSP problems are nothing but spin glass models at zero temperature, with energy of the system 
defined as number of violated constraints in CSPs. Thus finding a solution is equivalent to finding a configuration 
{a} = {ai\i = that has zero energy. More precisely one can define a Gibbs measure 


P{W}) 


ip/3E“l Sa({<T„}) 

' 7 ^ > 


where Z is partition function, Ea{{iJa}) is 0 if {ca} ^ Qa and is I otherwise. By taking (3 —>■ oo, and using e.g. cavity 
method, one can study properties of this Gibbs distribution reflecting the structure of solutions space [6|, such as 
whether ground-state energy is 0, whether Gibbs distribution is extremal, whether replica symmetry is broken etc. 
The previous study [MSI have shown that the similar picture of structure of configuration space and phase transitions 
exist in lots of interesting constraint satisfaction problems: when number of constraints is small, replica symmetry 
holds and Gibbs measure is extremal. With number of constraints (or edges in the graph) increasing, system undergoes 
clustering, condensation and satisfiability transitions respectively. At the clustering transition (also called dynamical 
transition), set of solutions begins to split into exponentially number of pure states, and replica symmetry holds in 
each pure state. At the condensation transition, size of clusters becomes inhomogeneous such that a finite number 
of clusters contains almost all the solutions. If the number of constraints keeps increasing and beyond satisfiability 
transition, neither cluster nor solution exists any more. Note that in some GSPs like K-SAT problem with K = 3 and 
some combinatorial optimization problems like independent set problem Eiim with low average degree, clustering 
transition and condensation transition are identical. While for some other problems like K-SAT problem with AT > 4 
and graph coloring problem, condensation transition is distinct from clustering transition, and there is a stable one 
step replica symmetry breaking (IRSB) phase. 

Studies based on replica symmetry cavity method and its associated Belief Propagation equations have been applied 
to model RB in m, and Bethe entropy S'sethe (leading order of logarithm of number of solutions) has been calculated 
on single instances. There are two interesting observations in [T^. First, BP equations always converge on single 
instances when energy reported by BP is zero. It means that in the satisfiable phase BP is always marginally stable, 
indicating that replica symmetry solution is always locally stable in satisfiable phase; Second, Bethe entropy agrees very 
well with first moment estimate of entropy (annealed entropy) S'sethe = InE(Al). These two phenomenons suggest 
that Belief Propagation algorithm may give a asymptotically correct marginal and free energy, and condensation 
transition does not exist. 
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III. SOLUTION SPACE STRUCTURE OF MODEL RB 

Heuristic analysis on solution space structure using cavity method and replica symmetry breaking are based on 
the concept of pure state, with assumptions of extremal Gibbs measure and exponential growth of both number of 
clusters and number of solutions in each cluster. Following |28j . in this paper we use a more concrete definition of 
cluster using the Hamming distance. Let us use S to denote set of all solutions in an instance. The Hamming distance 
between two arbitrary solutions x,y G S, noted d{x,y), is the number of conhgurations taking different values in 
x,y. We define diameter oi an set of solution A C 5 as the maximum Hamming distance between any two elements 
of X. The distance between two sets X,Y C S, is the minimum Hamming distance between any x G X and any 
y GY. We define cluster as a connected component of S, where every x,y G S are considered adjacent if they are at 
Hamming distance 1 (or an finite integer q, it does not affect the conclusion). We further define region as union of 
some non-empty clusters. 


A. clustering of solutions 

Our analysis is based on the number of solution pairs Z(x) at Hamming distance Nx, with 0 < a; < 1. Again as 
it is hard to compute Z{x) exactly, we turn to the expectation of Z{x). As shown in |13j . the number of assignment 
pairs at distance Ax is equal to 


t(x) = 

probability of a pair of assignments being two solutions is written as 

qix) = {{l-pf+pil-p)[{l - 

where 

, . —kik — l)x(l — x)^~^ 

M^ 

So the expectation of Z{x), denoted by E{Z{x)), is the product of t{x) and q{x). 

Since domain size grows with N in model RB, it is convenient to define the normalized version of E(Z(x)) given 
a,p and r: 


f{x) = lim ln(E(Z(x)))/(AlnA) 

Af—>00 

= a(l + x) + r\n[{l - pY + p{l - p){l - x)^]. (5) 

Actually fix'] is the annealed entropy density, which is a decreasing function of number of constraints. It is easy to 
see when f{x) < 0, E{Z{x)) -G 0. 

In Fig. ^we plot f{x) as a function of x for k = 2, p = 0.4, a = 0.8 and several different r values. The top line has 
a relatively small r, we can see that /(x) is above 0. It’s worth to mention this does not mean there are exponential 
number of solutions at distance Nx, because /(x) is only a lower bound for ln(Z(x))/(Aln A), indicated by Markov’s 
inequality 


P{Z{x) > 0) <E[Z{x)). 

With r increasing, this /(x) curve becomes lower and lower. At a certain value f, = 0.8815r* in our example in Fig. 
/(x) curve reaches 0. Beyond f, /(x) = 0 has 2 solutions [|] until r reaches r*. With r > r*, it has been proved 
m that there is no solution in the system which is consistent with what the curve shows: the upper bound of Z(x) 
becomes negative for any x value. 

We focus on the regime between f and r* (shaded regime in the figure) when /(x) = 0 has two solutions, denoted 
by Xi and X 2 . Using definition of /(x) in Eq. ([^, we can compute number of solutions at Hamming distance between 


[J] There are at most 2 solutions, following the concavity of f{x) shown in Appendix. 
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FIG. 1: Annealed entropy density (leading order of logarithm of number of solution pairs f{x) Eq. 1^), for a = 0.8, p — 0.4, 
k = 2. From top to bottom, r values are 0.81r*, 0.8815r*, 0.94r* and r* respectively. In the shaded regime (0.8815r*,r*), 
f{x) = 0 has two solutions, denoted by X\ and X 2 - One example of the two solutions are labeled in the figure for r = 0.94r*. 


xi and X 2 with N ^ oo: 


E ■ 

\xN—x\N-\-\ 




< 

M E 

Z{x)] 



\xN—xiN-\-l / 


< 

{X2 - Xi)N ■ 

max 

xG{xi,X2) 

E{Z{x)) 

1 


{X2 -Xi)N ■ 

max 

x£{xi,X2) 

1 


0 . 


( 6 ) 


Thus the last equation indicates that w.h.p. there is no solution pair at Hamming distance between x\ and x^- 
On the other hand, with TV —>■ oo, using Paley-Zigmund inequality we have 


P[X>-E(X)] > 


{EjX) - jfE{X)y 
e(a:2) 


(I - -)2 


(7) 


where we have made use of Eq. Q that limjv_,.oo = 1- The number of solutions X is w.h.p. bigger than its 

mean divided by N. 

It follows that w.h.p. in the regime where (a:i, X 2 ) pair exists (e.g. shaded regime in Fig.[^, system has exponentially 
number of solutions and their Hamming distance is discontinuously distributed. In other words, the solution space is 
clustered. Actually we can show that for all parameters of model RB, there always exists such clustered regime. A 
proof for the existence of xi and X 2 pair is given in Appendix [A} 


B. Organization of clusters 

In this section we give a precise description on clustering of solutions, including bounds for diameter of clusters, 
distance between clusters, number of solutions in one cluster and number of clusters in the satisfiable phase. Given 
the result from last section, using method from Achlioptas and Ricci-Tersenghi (referring to section 3 of [35], [lOj . 
section 3 of |in|)i we actually have a concrete way to split the solution space and put solutions into different clusters: 
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Assuming we know all the solutions, we can split the solution space by the cured surface {y\d{x^y) = XiN} for each 
solution, and obtain a set of regions A. In more detail we can do it as follows: 

1. Initialize A = {5}, with S denoting the set of all solutions. 

2. For every solution x G S, repeat splitting step (step 3) around x. 

3. Splitting step: denote the only region including x in A by A. If there is y G A satisfied d{x,y) > xiN, then let 
B = {y\y G A, d{x, y) < XiTV}, C = A \ B, and let A = (A U {B} U {C}) \ {A}. 

The final A is the set of regions we want. We can show that A has following properties: 

• The diameter of each region is at most XiN. Because if there are two solutions at distance more than xiN in a 
region, splitting step will for sure split them into different regions. 

• The distance between every pair of regions is at least {x 2 — xi)N. To show this, assume there are three solutions 
X, y and z, they are put into two clusters after the splitting step around x: y is put to the same region with x, 
and z is put to a different region. Then we have d{x,y) < XiN, d(x,z) > X 2 N, and the triangle rule implies 
that d{y, z) > (x 2 — Xi)N. 

Another important property we are interested in is the number of solutions in clusters. Here for convenience we talk 
about typical instances of model RB, only to avoid repeatedly using of “w.h.p.”. From above analysis we know that 
the diameter of each region is at most xiiV, so number of solution pairs in one cluster is bounded above by number 
of solution pairs Hamming distance smaller than Xi. Letting 

I = max E(Z(x)) (8) 


and using Markov’s inequality we have 


P 


/ xiN 

E 

\xN—l 


NH 


Z(x) >N^n< 


< 


Nl 

Wd 


I 

N' 


(9) 


We can see that every region in A have at most N'^l pairs of solutions, which implies that every region in A have at 
most N^/l solutions. We know that /(x) is a concave function, and is monotonically decreasing with x G [0,xi] (see 
Appendixfor a proof), so as N is very large, number of solutions in one cluster is smaller than 

N\fl < 


Note that compared with the lower bound of total number of solutions ^E(A) (Eq. 0 ) , number of solutions in one 
region is exponentially smaller. To make it more precise, if we define a complexity function E representing leading 
order of logarithm of number of clusters divided by N log N (note that in our system, correct scaling for densities is 
N log N ), we have as N is very large 


E > 

> 


I 

A^lnfV 

1 


In 


Nyfl 


- [a + rln(l-p)] 


2 

~N' 


( 10 ) 


Last equation says that in the satisfiable phase, complexity is positive all the way down to the transition. 

A direct implication from above results is that in whole parameter range, phase diagram of model RB does not 
contain condensed clustered phase, because there does not exist a set of finite number of clusters that contain almost 
all the solutions. In replica symmetry breaking theory, existence of clustering transition is indicated by E(to = I) > 0 
and existence of condensation is indicated by S(to = 1) < 0 where E denotes complexity which is leading order of 
logarithm of number of pure states as a function of Parisi parameter m j5]. With Parisi parameter m = I, first step 
replica symmetry breaking solution actually gives equal weight to each pure state, thus the total free energy is identical 
to the replica symmetry free energy. We can see that our definition of complexity E is very similar to £(to = I) 
because it gives equal weight to different clusters. Thus E > 0 all the way down to the satisfiability transition is 
another way to show that there is no condensation transition in model RB. Note since our definition of clusters is 
different from pure state (as we do not refer to properties of Gibbs measure), our claim is not a proof. 
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IV. CONCLUSION AND DISCUSSION 

As a conclusion, in this paper we described in detail the solution space structure of model RB problem using rigorous 
methods. We show that close to the satisfiability transition, solutions clustered into exponential number of clusters, 
each of which contains sub-exponential number of solutions. And we showed that there is no condensation transition 
in model RB which testifies an statement of Zhao et al [19j using non-rigorous cavity methods from statistical physics. 

The factor graph of model RB has a special feature that the link degree per variable is very large (growing with 
number of variables N), which is the same as model K-SAT with growing K [SD]. We think this feature will affect 
phase transitions, and we will put more thoughts on that in future work. 

We note that though we proved the clustering of solutions close to the satisfiability transition, we are still not 
sure where the clustered phase begins. Though lack of rigorous methods, heuristically the clustering transition can 
be estimated when one step replica symmetry breaking cavity method at Parisi parameter m = 1 begins to have 
non-trivial solution. We will address this point in future work. 

It has been shown that instead of clustering, freezing of clusters is the real reason for algorithmic hardness. Nu¬ 
merical experiments made in [19] and [18] showed that starting from r (where f{x) = 0 has only one solution), the 
most efficient algorithms begin to fail in finding solutions, so it suggests that clusters become frozen immediately at 
f. This would be interesting to study in detail. 
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Appendix A: Concavity of f{x) 


The first and second derivatives of f{x) with respect to x read 


dx 

dx"^ 


= a — 


rpk{l — x) 


k-l 


1— p + p{l — x)^ 
rpk 


[1 - p + p{l - xY] 


[{k-l){l-p){l-xf-^-p{l-xf^-^]. 


Then it is easy to check that is always positive for x G [0,1] with k >2 and p < 1 — |, which implies the concavity 
of f{x). 

Observe that both /(O) and /(I) are positive in the satisfiable phase, /(O) = /(I) = 0 at the satisfiable-unsatisfiable 
transition, and = a > 0. So using the concavity of f{x), it is obvious that there must exist r < r*, and 

pair xi,X 2 such that f{x) < 0 with xi < x < X 2 - Moreover in the satisfiable phase, given /(O) = a + r ln(l — p) > 0 
and xi is the first point that f(x) reaches 0, we can conclude that f{x) is a monotonically decreasing function with 
X G [0, xi]. 
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